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Abstract 

Inner and outer bounds are derived on the optimal performance of fixed length block-codes on discrete memoryless 
channels with feedback and errors-and-erasures decoding. First an inner bound is derived using a two phase encoding 
psj ' scheme with communication and control phases together with the optimal decoding rule for the given encoding scheme, 

among decoding rules that can be represented in terms of pairwise comparisons between the messages. Then an outer 
bound is derived using a generalization of the straight-line bound to errors-and-erasures decoders and the optimal error 
. exponent trade-off of a feedback encoder with two messages on a DMC. Finally upper and lower bounds are derived 

for the optimal erasure exponent of error free block-codes in terms of the rate. 

m : 

I. Introduction: 

Shannon showed in ll28l that the capacity of discrete memoryless channels (DMCs) does not increase even when 
a noiseless and delay free feedback link is available from receiver to transmitter. On symmetric DMCs the sphere 
packing exponent bounds the error exponent of fixed length block-codes from above, as shown by DobrushirQ in flTOl . 
O . Thus relaxations like errors-and-erasures decoding or variable length coding are needed for feedback to increase the 
error exponent of block-codes at rates larger than the critical rate on symmetric DMCs. In this work we investigate one 
such relaxation, namely errors-and-erasures decoding and find inner and outer bounds to the optimal error exponent 
^ ■ erasure exponent tradeoff. This analysis complements the research on two related block coding schemes: variable 
00 length block coding and errors-and-erasures decoding for block-codes without feedback. We start with a very brief 
CO overview of the previous work on these problems to motivate our investigation. 

Burnashev O, (H, ||5l was the first one to consider variable-length block-codes with feedback, instead of fixed 
^ | length ones. He obtained the exact expression for the error exponent at all rates. Later Yamamoto and Itoh, i32l . 

suggested a coding scheme which achieves the best error exponent for variable-length block-codes with feedback 
O ■ by using a fixed length block-code with an errors-and-erasures decoding, repetitively until a non-erasure decoding 
| occurso In fact any fixed length block-code with erasures can be used repetitively, like it was done in 11321 . to get a 
variable length block-code with essentially the same error exponent as the original fixed length block-code. Thus @ 
can be reinterpreted to give an upper bound to the error exponent achievable by fixed length block-codes with erasures. 
Furthermore this upper bound is achieved by the fixed length block-codes with erasures described [32], when erasure 
probability is decaying to zero sub-exponentially with block length. However the techniques used in this stream of 
work are insufficient for deriving proper inner or outer bounds for the situation when erasure probability is decaying 
exponentially with block length. As explained in the below paragraph the case with strictly positive erasure exponent 
is important both for engineering applications and for a better understanding of soft decoding with feedback. Our 
investigation provides proper tools for such an analysis, results in inner and outer bounds to the trade-off between 
error and erasure exponents, while recovering all previously known results for the zero erasure exponent case. 

When considered together with higher layers, the codes in the physical layer are part of a variable length/delay 
communication scheme with feedback. However in the physical layer itself fixed length block-codes are used instead 
of variable length ones because of their amenability to modular design and robustness against the noise in the feedback 
link. In such an architecture retransmissions affects the performance of higher layers. The average transmission time 

'Later Haroutunian, 1161 , established an upper bound on the error exponent of block-codes with feedback. This upper bound is equal to 
sphere packing exponent for symmetric channels but it is strictly larger than the sphere packing exponent for non-symmetric channels, 
including erasures will not result in an increase in the exponent for variable-length block-codes with feedback. 



is only a first order measure of this effect: as long as the erasure probability is vanishing with increasing block length, 
average transmission time will essentially be equal to the block length of the fixed length block-code. Thus with 
an analysis like the one in 11321 . the cost of retransmissions are ignored as long as the erasure probability goes to 
zero with increasing block length. In a communication system with multiple layers, however, retransmissions usually 
have costs beyond their effect on average transmission time, which are described by constraints on the probability 
distribution of the decoding time. Knowledge of error erasure exponent trade-off is useful in coming up with designs 
to meet those constraints. An example of this phenomena is variable length block coding schemes with a hard dead 
lines for decoding time, which has already been investigated by Gopala et. al. [15] for block-codes without feedback. 
They have used a block coding scheme with erasures and they have resend the message whenever an erasure occurs. 
But because of the hard dead line they employ this scheme only for some fixed number of trials. If all those trials 
fail, i.e. lead to an erasure, they use a non-erasure block-code. Using the error exponent erasure exponent trade-off 
they were able to obtain the best over all error performance for the given architecture. 

This brings us to the second stream of research we complement with our investigation: errors-and-erasures decoding 
for block-codes without feedback. Forney |[L3l was the first one to consider errors-and-erasures decoding without 
feedback. He obtained an achievable trade-off between the exponents of error and erasure probabilities. Then Csiszar 
and Korner, (9] achieved the same performance using universal coding and decoding algorithms. Later Telatar and 
Gallager, [31], introduced a strict improvement on certain channels over the results presented in [13 ] and [9]. Recently 
there has been a revived interest in the errors and erasures decoding for universally achievable performances ETTl . EOl , 
for alternative methods of analysis [19], for extensions to the channels with side information ESI and implementation 
with linear block-codes 1171 . The encoding schemes in these codes do not have access to any feedback. However if the 
transmitter gets to learn whether the decoded message was an erasure or not, it can resend the message whenever it is 
erased. Because of this block retransmission variant these problems are sometimes called decision feedback problems. 

We complement the results on the error exponent erasure exponent trade off without feedback and the results about 
error exponent of variable length block-codes with feedback, by finding inner and outer bounds to the error exponent 
erasure exponent trade off of fixed length block-codes with feedback. We first introduce our model and notation in the 
following Section [III Then in Section [HI] we derive a lower bound using a two phase coding algorithm similar to the 
one described by Yamamoto and Ito in [32] and decoding rule and analysis techniques, inspired by Telatar's in 11301 
for the non-feedback case. Note that the analysis and the decoding rule in |[30l is tailored for a single phase scheme 
and without feedback and the two phase scheme of IT321 is tuned specifically to zero-erasure exponent; coming up with 
framework in which both of the ideas can be used efficiently is the main technical challenge here. In Section [IV] we 
first advance the straight line bound idea introduced by Shannon, Gallager and Berlekamp in E9l to block-codes with 
erasures. Then we use it together with an outer bound on the error exponent trade off between two codewords with 
feedback to establish an outer bounds. In Section [V] we first introduce error free block-codes with erasures and discuss 
its relation to the fixed length block-codes with errors and erasures, and then we present inner and outer bounds to 
the erasure exponent of error free block-codes and point out its relation to the error exponent erasure exponent trade 
off. 

Before starting the presentation of our analysis, let us make a brief digression, and discuss two channel models 
in which the use of feedback had been investigated for block-codes without erasures. First channel model is the 
well known additive white Gaussian noise channel (AWGNC) model. In AWGNCs if the power constraint V is on 
the expected value of the energy spent on a block E[5 n ] i.e. power constraint is of the form E[<S n ] < Vn, the 
error probability can be made to decay faster than any exponential function with block length n. Schalkwijk and 
Kailath suggested a coding algorithm in E71 which achieves a doubly exponential decay in error probability for 
continuous time AWGNCs, i.e. infinite bandwidth case. Later Schalkwijk E51 modified that scheme to achieve the 
same performance in discrete time AWGNCs, i.e. finite bandwidth case. Concatenating Schalkwijk and Kailath scheme 
with pulse amplitude modulation stages, gives a multi-fold decrease in the error probability E4ll . 11331 , fffl . However 
this behavior relies on the non-existence of any amplitude limit, the particular form of the power constraint and the 
noise free nature of the feedback link. First of all, as observed in [5] and [22] when there is an amplitude limit, 
error probability decays exponentially with block length. More importantly if the power constraint restricts the energy 
spent in transmission of each message for all noise realizations, i.e. if the power constraint is an almost sure power 
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constraint of the form S n < Vn; then sphere packing exponent is still an upper bound to the error exponent for 
AWGNCs as shown by Pinsker, |[24ll . Furthermore if the feedback link is also an AWGNC and if there is a power 
constrain^ on the feedback transmissions, then even in the case when there are only two messages, error probability 
decays only exponentially as it has been recently shown by Kim et.al. lfl"8ll . 

The second channel model is the DMC model. Although feedback can not increase the error exponent for rates 
over the critical rate, it can simplify the encoding scheme 11331 . lfT2l . Furthermore, for rates below the critical rate it 
is possible to improve the error exponent using feedback. Zigangirov [ 33 1 has established lower bounds to the error 
exponent for BSCs using such a simple encoding scheme. Zigangirov 's lower bound is equal to the sphere packing 
exponent for all rates in the interval [R' crit , C] where R crit < R C rit and Zigangirov 's lower bound is strictly larger than 
the corresponding non-feedback exponent for rates below R crit . Later Burnashev |6] has introduced an improvement 
to Zigangirov 's bound for all positive rates less than R crit - D'yachkov |12] generalized Zigangirov 's encoding scheme 
for general DMC's and established a lower bounds to the error exponents for general binary input channels and k-ary 
symmetric channels. However it is still an open problem to find a constructive technique that can be used for all 
DMC's which outperforms the random coding bound. Like AWGNCs there has been a revived interest in the effect of 
a noisy feedback link and achievable performances with noisy feedback on DMCs. Burnashev and Yamamoto recently 
showed that error exponent of BSC channel increases even with a noisy feedback link (3), Q. Furthermore Draper 
and Sahai ifTTll investigated the use of noisy feedback link in variable length schemes. 

II. Model and Notation: 

The input and output alphabets of the forward channel are X and y, respectively. The channel input and output 
symbols at time t will be denoted by X t and Y t respectively. Furthermore, the sequences of input and output symbols 
from time t\ to time t% are denoted by X^ 2 and Y^ 2 . When t\ = 1 we omit t\ and simply write X' 2 and Y* 2 instead 
of X* 2 and Y^ 2 . The forward channel is a stationary memory less channel characterized by an |<-t|-by-|[y| transition 
probability matrix W. 

P[Y t \X t ,Y t - 1 ]=P^ t \X t ] = W(Y t \X t ) Vi. (1) 

The feedback channel is noiseless and delay free, i.e. the transmitter observes Y t _x before transmitting X t . 

The message M is drawn from the message set M. with a uniform probability distribution and is given to the 
transmitter at time zero. At each time t G [1, n] the input symbol X t (M,Y t ~ 1 ) is sent. The sequence of functions 
Xt(-) : M. X 3^' 1 which assigns an input symbol for each m G M. and y 1 ^ 1 G 3^' 1 is called the encoding function. 

After receiving Y n the receiver decodes a M(Y n ) G {x}uA^ where x is the erasure symbol. The conditional erasure 
and error probabilities P x |m and P e |M and unconditional error and erasure probabilities, P x and P e are defined as, 



Px\M — P 
P. = P 



M = x 
M = x 



M 



-fe|M — P 
P. = P 



M / M 
M / M 



M 



P 



x|M 



Px 



Since all the messages are equally likely we have, 



Ml E m P - 



e|m 



x ~~ \M\ Z^ m x l m e ~ \M\ 

We use a somewhat abstract but rigorous approach in defining the rate and achievable exponent pairs. A reliable 
sequence Q, is a sequence of codes indexed by their block lengths such that 



lim (P e (") + P X W + jj^ 



0. 



In other words reliable sequences are sequences of codes whose overall error probability, detected and undetected, 
vanishes and whose size of message set diverges with block length n. 

Definition 1: The rate, erasure exponent, and error exponent of a reliable sequence Q are given by 

R Q 4 lim inf £ xQ 4 l im in f =±Z^ E eQ 4 l im inf ^ilL^i. 



3 As Kim et. al. |18| calls it. 

4 This constraint can be an expected or almost sure constraint. 
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Haroutunian, [16, Theorem 2], has already established a strong converse for erasure free block-codes with feedback 
which in our setting implies that lim n _ !>00 (P e ^ n ^ + P x ^) = 1 for all codes whose rates are strictly above the capacity, 
i.e. R > C. Thus we consider only rates that are less than or equal to the capacity, R < C. For all rates R below 
capacity and for all non-negative erasure exponents E x , we define the (true) error exponent £ e (R, E x ) of fixed length 
block-codes with feedback to be the best error exponent of the reliable sequenced whose rate is at least R and whose 
erasure exponent is at least E x . 

Definition 2: \/R < C and \/E x > the error exponent, £ e (R,E x ) is, 

£ e (R,E x ) ± sup ^ E eQ . (2) 

Q:Rq>R,E x q>E x 

Note that 

£ e (R,E x )=£(R) \/E x >£(R) (3) 

where £{R) is the (true) error exponent of erasure-free block-codes on DMCs with feedback^ Thus benefit of the 
errors-and-erasures decoding is the possible increase in the error exponent as the erasure exponent goes below £{R). 

Determining £{R) for all R's and for all channels is still an open problem; only upper and lower bounds to £{R) 
are known. Our investigation focuses on quantifying the gains of errors-and-erasures decoding instead of finding £{R). 
Consequently, we restrict ourselves to the region where the erasure exponent is lower than the error exponent for the 
encoding scheme. 

For future reference let us recall the expressions for the random coding exponent and the sphere packing exponent, 
E r (R,P) = minD (V\\ W\P) + |l (P,V) - R\ + E r {R) = max E r (R, P) (4) 

E S JR,P)= min D (V\\ W\P) E sp (R) = max E S JR, P) (5) 

V:\{P,V)<R P 

where D (V\\ W\P) stands for conditional Kullback Leibler divergence of V and W under P, and I (P, V) stands for 
mutual information for input distribution P and channel V. 

We denote the y marginal of a distribution like P(x)V(y\x) by (PV)y- The support of a probability distribution 
P is denoted by suppP 

III. An Achievable Error Exponent - Erasure Exponent Trade Off 

In this section we establish a lower bound to the achievable error exponent as a function of erasure exponent 
and rate. We use a two phase encoding scheme similar to the one described by Yamamoto and Ito in ll32l together 
with a decoding rule similar to the one described by Telatar in [30]. In the first phase, the transmitter uses a fixed- 
composition code of length an and rate ^. At the end of the first phase, the receiver makes a maximum mutual 
information decoding to obtain a tentative decision M. The transmitter knows M because of the feedback link. In 
(n — ni) long second phase the transmitter confirms the tentative decision by sending the accept codeword, if M = M, 
and rejects it by sending the reject codeword otherwise. At the end of the second phase the receiver either declares 
an erasure or declares the tentative decision as the decoded message. Receiver declares the tentative decision as the 
decoded message only when the tentative decision "dominates" all other messages. The word "dominate" will be 
made precise later in Section ITlI-B I Our scheme is inspired by |[32l and |[30l . However, unlike 021 our decoding rule 
makes use of outputs of both of the phases instead of output of just second phase while deciding between declaring 
an erasure or declaring the tentative decision as the final one, and unlike 1301 our encoding scheme is a feedback 
encoding scheme with two phases. 

In the rest of this section, we analyze the performance of this coding architecture and derive the achievable error 
exponent expression in terms of a given rate R, erasure exponent E x , time sharing constant a, communication phase 
type P, control phase type (joint empirical type of the accept codeword and reject codeword) IT and domination rule 
>~. Then we optimize over >-, IT, P and a, to obtain an achievable error exponent expression as a function of rate R 
and erasure exponent E x . 

5 We restrict ourselves to the reliable sequences in order to ensure finite error exponent at zero erasure exponent. Note that a decoder which 
always declares erasures has zero erasure exponent and infinite error exponent. 

6 In order to see this consider a reliable sequence with erasures Q and replace its decoding algorithm by any erasure free one, Q' such that 
M'(t/ n ) = M(y n ) if M(y") / x. Then P e ( g' < pJq +pJq; thus E eQ , = min{_E xQ , E eQ } and R Q , = R Q . This together with the definition 
of £(R) leads to equation (f5](. 
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A. Fixed-Composition Codes and The Packing Lemma 

We start with a very brief overview of certain properties of types, a thorough handling of type idea can be found 
in Q. The empirical distribution of an x" G X" is called the type of x n and the empirical distribution of transitions 
from a x" G X" to a y n G Af n is called the conditional type^ 

n 

t=l 

n 

V y1x"(y|S)=T^pi(i)^l{x t =i}l{j/ t =y} VyG^, Vis.t. P,n(5;)>0. (7) 

t=i 

For any probability transition matrix W : suppP^n — y y we havd^l 

JJ W(y t |x t ) = e -n(D(V s n kn ||W|P»")+H(V y n k „| Pl „)) (§) 
t=l 

The set of all y n 's with the same conditional type V with respect to x" is called the V-shell of x" and denoted by 

T v (x n ): 

T v {x n ) = {y n :V r]x „ = V}. (9) 

Note that for any transition probabilities from X to y total probability of Ty (x n ) has to be less than one. Thus by 
assuming that transition probabilities are V and using equation © we can conclude that, 

\T v (x n )\ < e H 0Vi*"|P*»)) (10) 

Codes whose codewords all have the same empirical distribution, Pw TO ) = P Vm G M. are called fixed-composition 
codes. In Section IIII-DI we will describe the error and erasure events in terms of the intersections of V— shells of 
different codewords. For doing that let us define (V,V,m) as the intersection of F-shell of x"(m) and the 



F-shells of other codewords: 

FW (y,V,m) ^T y (x n (m))f|U^ m T^(x n (m)). (11) 

The following packing lemma, proved by Csiszar and Korner (9l Lemma 2.5.1], claims the existence of a code with 
a guaranteed upper bound on the size of FW (v, V, 

Lemma 1: For every block length n > 1, rate R > and type P satisfying H(P) > R, there exist at least [e n ( R_5n )j 
distinct type P sequences in X" such that for every pair of stochastic matrices V : suppF — >• y, V : suppF — > y 
and Vm G M. 



F (") ( v, V, 



<\T v (x n (m))\e 



-n\I(P,V)-R\^ 



where S n = l£i±(WW^Mg±l) 

" n 

Above lemma is stated in a slightly different way by the authors of [9], for a fixed 5 and large enough n. However, 
this form follows immediately from their proof. 

If we use Lemma [TJ together with equations ((8]) and dTOb we can bound the conditional probability of observing a 
y n G F(") (V, V, rnj when M = to as follows. 

Corollary 1: In a code satisfying Lemma [TJ when message m G M. is sent, the probability of getting a y" G 
Ty (x"(m)) which is also in (x"(m)), for some m e M such that m / m is bounded as follows, 

F (n) f V, t>, M) m! < e -"v(R,P,v,v) (12) 



where 

ti(r,P,V,v) =D(V\\W\P) + \\ (p,V \ -R\ + (13) 

7 Note that P y « corresponds to a distribution on X for all x" £ X n , where as V^^n determines a channel from the support of P^n to 3^- 
8 Note that for any W : X — > 3^ there is unique consistent W' : suppPjjn — y y. 
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B. Coding Algorithm 

In the first phase, the communication phase, we use a length ni = [an] type P fixed-composition code with 
[e ni (~ _<5n i)j codewords which satisfies the property described in Lemma [T] At the end of the first phase the receiver 
makes a tentative decision by choosing the codeword that has the maximum empirical mutual information with the 
output sequence Y" 1 . If there is a tie, i.e. if there are more than one codewords which have maximum empirical 
mutual information, the receiver chooses the codeword which has the lowest index. 

M = im : ' £\>'*"H ? ! w- < m ] ( 14 ) 

[ I {P,\/yn\ x «( m )) > I [P,vyi\x"(fn)) Vto > to J 

In the remaining (n — ni) time units, the transmitter sends the accept codeword x" i+1 (a) if M = M and sends the 
reject codeword x" i+l {r) otherwise. 

Note that our encoding scheme uses the feedback link actively for the encoding neither within the first phase nor 
within the second phase. It does not even change the codewords it uses for accepting or rejecting the tentative decision 
depending on the observation in the first phase. Feedback is only used to reveal the tentative decision to the transmitter. 

Accept and reject codewords have joint type H(x, x), i.e. the ratio of the number of time instances in which accept 
codeword has an x G X and reject codeword has a x G X to the length of the codewords, (n — ni), is IL(x, x). The 
joint conditional type of the output sequence in the second phase, Uj,» , is the empirical conditional distribution of 
Vn +i- We call set of all output sequences y„ + i whose joint conditional type is U, the [/-shell and denote it by Tjj. 

Like we did in the Corollary [T] we can upper bound the probability of [/-shells. Note that if Y" +1 G Tjj then, 

P[Y n n 1+ iK +1 = < +1 (a)] =e -(n-ni)(D(i/||^|n)+H(t/|n)) 
P^ 1+1 |^ 1 i+1 = < +1 (r)]= e -(—)( D (^ll^|n) + H( C ;|n)) 

where x n n +1 (a) is the accept codeword, x" +1 (r) is the reject codeword, W a (y\x,x) = W(y\x) and W r (y\x,x) = 
W(y\x). Noting that \T V \ < e -("-"iWU\U)\ we get: 

P[r a |X^ i+1 = < + 1 (a)] < e -(n-nx)D(^||W a |n) (15a) 
P[T V \K 1+1 = < +1 (r)] < e -to-*)0(V\\Wrm. (15b) 

C. Decoding Rule 

For an encoder like the one in Section IIII-BI a decoder that depends only on the conditional type of Y" 1 for 
different codewords in the communication phase, i.e. Vynxunj^'s for m G Ai, the conditional type of the channel 
output in the control phase, i.e. Uy n " +1 , and the indices of the codewords can achieve the minimum error probability 
for a given erasure probability. However finding that decoder becomes analytically intractable problem early on. 
Instead, we restrict ourselves to the decoders that can be written in terms of pair wise comparisons between messages 
given Y n . Furthermore we assumes that these pairwise comparisons depend only on the conditional type of Y" 1 for 
the messages compared, the conditional output type in the control phase and the indices of the messages. Thus if 
the triplet corresponding to the tentative decision (^Y"i\x"i(M)^ ^y; i+i! M) dominates all other triplets of the form 
(VYn\x"i( m ), UY„ n i+1 , m) for m/M, the tentative decision becomes final; else an erasure is declared]^ 

^[M ifVm/M (V Y „ l|M ,U Y . i+i ,M)^(V Yni|m ,U YnVl! m)| ^ 



x if 3m / M s.t. (V Yn U Y » , M) + (Vym,™, U Y « 



m 



The binary relation y is such that if (V, U, m) dominates (V, U, m) then (V, U, m) does not dominate (V, U, m): 

(V, U, to) >- (V, U, to) (V, U, to) ¥■ (V, U, to). 

This property is a necessary and sufficient condition for a binary relation to be a domination rule. Decoder given 
(fT6l) . however, either accepts or rejects the tentative decision M given in (fT4l) . Consequently its domination rule also 
satisfies following two properties: 

'Note that conditional probability, P[Y"| M = m], is only a function of corresponding \/y"i\ x «( m ) and lly +1 - Thus all decoding rules, that 

p[Y n |M— m] 

accepts or rejects the tentative decision, M, based on a threshold test on likelihood ratios, pj yn | M _ m j , for raf M are in this family of decoding 
rules. 
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(a) If the empirical mutual information of the messages in the communication phase are not equal, only the message 
with larger mutual information can dominate the other one. 

(b) If the empirical mutual information of the messages in the communication phase are equal, only the message with 
lower index can dominate the other one. 

For any such binary relation there is a corresponding decoder of the form given in Equation (fT6l ). In our scheme we 
either use the trivial domination rule leading to the trivial decoder M = M or the domination rule given in equation 
(fTTT ). both of which satisfies these conditions. 

f I (P,V) > I (p, V) and cm ( P, V, V) + (1 - a)D (U\\ W a \Il) < E x if m > m 
(V, U, m) >~ (V, U,fh)<^< ) J >" J (17) 

1 \(P,V) > I (P,V) and ar) (^,P,V,V) + (1 -a)D(U\\ W a \IL) < E x if m < m 



where rj yR, P, V, V j is given by the equation (U3l) . 

Among the family of decoders we are considering, i.e. among the decoders that only depend on the pairwise 
comparisons between conditional types and indices of the messages compared, the decoder given in (fT6l ) and (fTTT ) is 
optimal in terms of error-exponent-erasure-exponent tradeoff. Furthermore, in order to employ this decoding rule, the 
receiver needs to determine only the two messages with the highest empirical mutual information in the first phase. 
Then the receiver needs to check whether the triplet corresponding to the tentative decision dominates the triplet 
corresponding to the message with the second highest empirical mutual information. If it does then, for the rule given 
in (fTTT ). it is guaranteed to dominate rest of the triplets too. 

D. Error Analysis 

Using an encoder like the one described in Section IIII-BI and a decoder like the one in (fT6l ) we achieve the 
performance given below. If E x < aE r (^,P) then the domination rule given in equation (fTTT ) is used in the decoder; 
else a trivial domination rule that leads to a non-erasure decoder, M = M, is used in the decoder. 

Theorem 1: For any block length n > 1, rate R, erasure exponent E x , time sharing constant a, communication 
phase type P and control phase type II, there exists a length n block-code with feedback such that 

In \M\ > e< R - 5 ^ P x < e -"(^-0 P e < e -^E e (R,E x ,a,p,n)-s' n ) 
where E e (R, E x , a, P, II) is given by, 

aE r (%,p) if Ex >aE r (§,P) 

min ctn(2p,V,v)+(l-a)D(U\\W r \n) if E x < aE r {% , P) 



E e - 



(V,V,U):{V,V,U)<EV 
a V [ ^,P,V,v)+(l- a )D(U\\W a \TI)<E* 



(18a) 



V = {(V 1 ,V 2 ,U) : I (P,Vi) > I (P,V 2 ) and (PVt)y = (PV 2 ) Y } (18b) 
j n _ (|*|+i) 2 |y|io g (n+i) (18c) 

The optimization problem given in ( fT8l ) is a convex optimization problem: it is minimization of a convex function 
over a convex set. Thus the value of the exponent, E e (R, E x , a, P, II) can numerically be calculated relatively easily. 
Furthermore E e (R, E x , a, P, H) can be written in terms of solutions of lower dimensional optimization problems (see 
equation d37l ). However problem of finding the optimal (a,P, IT) triple for a given (R,E X ) pair is not that easy in 
general, as we will discuss in more detail in Section IIII-EI 

Note that for all control phase types II and control phase output types U, D (U\\ W a \U) > 0, D (U\\ W r \U) > 0. 
Using this fact together with the definitions of E r (R, P) and rj ( R, P, V, V ) given in (01) and ( fT3T ) we get: 



E e (R,E x ,a,P,U) > aE r (§,P) V(i?, E x , a, P,U) s.t. E x < aE r (%,P) (19) 

Since we are interested in quantifying the gains of errors-and-erasures decoding over the decoding schemes without 
erasures we are ultimately interested only in the region where E x < aE r (^,P) holds. However equation (IT8T > gives 
us the whole achievable region for the family of codes we are considering. 
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Proof: A decoder of the form given in (fT6l) decodes correctly when M = M and (Y n , M) >- (Y n , m) for al0 
M. Thus an error or an erasure occur only when the correct message does not dominate all other messages, i.e. 

when 3m 7^ M such that (Y n , M) ^ (Y n , m). Consequently, we can write the sum of conditional error and erasure 

probabilities for a message m € Ai as, 

P e \ m + P x \ m = P[{2/" :3m ^m s.t(y n , m) ^ {y\ m)}\ M = m] (20) 

This can happen in two ways, either there is an error in the first phase, i.e. M / m or first phase tentative decision 
is correct, i.e. M = m, but the second phase observation y™ +1 leads to an erasure i.e. M = x. For a decoder using a 
domination rule satisfying constraints described in Section IIII-CI 

P e \m + P*\m < J2 Y Y P[2/ ni M 

v v-.\(py)>\{py) y n i6F( n i)(v,y,m) 

+ 2 2 Y P[y ni IH Y Y p [ 

2/ni+l| x n 1 +l( a )] ■ 

v V:\(P,V)<\(P,V) j n ieF("il(V,V,m) U:{V,U,m)^{V,U,m+l) V^+i^Tu 

where0 F^" 1 ^ (v, V, rn^j is the intersection of F-shell of message m £ M with the ^-shells of other messages, 
defined in equation (fTTb - As result of Corollary [T] 

Y P[|/ ni |m]=PF (ni) (y,y,m) M = m 

< e -niJ7(f,P,y,V)_ 



Furthermore, as result of equation (1 1 5ab we have 

Y P[<+i|<+i(«)] =P[T l/ |X^ i+1 = x n n i+1 (a)] 

< e -(n- ni )D(C/||W a |n)_ 

In addition the number of different non-empty ^-shells in the communication phase is less than (ni + 1)1*11^1 and 
the number of non-empty [/-shells in the control phase is less than (n — ni + 1)1*1 I^L We denote the set of (V, V, U) 
triples that corresponds to erasures with a correct tentative decision by V x : 

V x 4 |(v, V, U) : I (P, V) > I (P, V\ and (PV) Y = (PV)y and (V, U, m) / (V, U, m + 1)} . (21) 

In the above definition m is a dummy variable and V x is the same set for all m € M. Thus using (12Tb we get 

PeK + PxK<(ni + l) 2 ™ . max^ e-^MW) 

V,V:l(P,V)<l(P,V) 

+ (m + l) 2 ™(n - m + l)^l 2 l^l max e -ni^(ii/a,P,V,V))+(n-n 1 )D(Er||W.|n))_ 

(v,t>,c/)eVx 

Using the definition of E r (^,P) given in (@]) we get 

P e \m + Px| m < ^ max | e -na^(P/a 1 P) )e -nmin (v ^^ )eVx ar,(P/a,P > y,y) + (l- Q )D( [ /||^|n ) j (22) 

On the other hand an error occurs only when an incorrect message dominates all other messages, i.e. when 3m ^ m 
such that (Y n , m) >~ (Y n , m) for all m / m: 

P e \ m = P[{y n : 3m ^ to s.t. (y n , to) >- (y n , m) Vm 7^ m}| M = m] . 

Note that when a m£jVi dominates all other m 7^ m, it also dominates m, i.e. 

|y n : 3m 7^ m s.t.(y n , m) >- (y n , m) Vm 7^ m} C {y n : 3m 7^ m s.t.(y n , m) >- (y n , m)} . 

I0 We use the short hand (Y n , M) >- (Y n , m) for (Wi | M , Uy n " 1 , M) y (V Y ni | m , Uy n " +1 , m) in the rest of this section. 
"Note that for the case when m = \M\, we need to replace (V, i7, m) )f (V, U, m + 1) with (V, [7, m — 1) )f (V, U, m). 
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Thus, 

P e \ m < P[{2/" :3m^m s.t.(y n , fh) y (y n , m)}\ M = m] 

= E £ £ P[2/ ni |M = m] J] £ P[<+i|<+iW]- (23) 

y y : i(p,y)>i(p,v) y n ieF("i)(v,\/,m) c/:(y,c/,m-i)^(v,c/,TO) 

The tentative decision is not equal to m only if there is a message with a strictly higher empirical mutual information 
or if there is a messages which has equal mutual information but smaller index. This is the reason why we sum over 
(V, U,m — 1)>- (V, U, m). Using the inequality (1 1 5bb in the inner most two sums and then applying inequality (fT2l ) 
we get, 

p < ( n + 1) (l^| 2 +2|^|)|y| max e -n(« J (fl/a 1 P I V^)+(l-«)D(^||W r [n)) 

{ v,v,uy- - l ( p ^( p > v ) 

(V,U,m-i)y(y,U,m) 
< e n<5: e -nmm (t . ^ [/)eVe ( Q r,(i?/ a ,P,V',y)+(l-Q)D(C/||H/,|n)) 

_ e n< e -nmm (v . ii> , [/)eVe (ar,(P/a,P,\/,V')+(l-a)D(C/||iy r |n)) QA) 

where V e is the complement of V x in V given by 

V e = |(v, V,U):\ (P, V) > I (P, V) and {PV) Y = {PV)y and (V, U, m) y- (V,U,m + l)\. (25) 

Note that m in the definition of V e is also a dummy variable. The domination rule >~ divides the set V into two 
subsets: the erasure subset V x and the error subset V e - Choosing domination rule is equivalent to choosing the V e . 
Depending on the value of aE r (^,P) and we chose different V e 's as follows: 
(i) Ex>aE r (^,P): V e = V. Then V x = and Theorem [Q follows from equation 



I (P, V) > I [P, V) and (PV) Y = (PV) Y and 
(ii) E x <aE r (%,P):V e = { (V,V,U) : , \n ' } . Then all the (V, V, U) 

a V (%,P,V,V + 1-qD (U W a U ) < E* 



triples satisfying oq I f , P, V, V) + (l-a)D ( U\\ W a \U) < E x are in the the error subset. Thus as a result of 
(|22l erasure probability is bounded as P x < e - "^" - ' 5 ") and Theorem Q] follows from equation ((24)) . 



£. Lower Bound to £ e {R, E x ): 

In this section we use Theorem [Jto derive a lower bound to the optimal error exponent £ e {R, E x ). We do that by 
optimizing the achievable performance E e (R, E x , a, P,TL) over a, P and II. 

1) High Erasure Exponent Region (i.e. E x > E r (R)): As a result of (fT8l ). VP > and VP X > E r (R) 

E e {R,E x ,a,P,U) = aE r (§,P) < E r (R) Va, VP, VII (26a) 

E e (R, E x , a, P, II) = E r (R) 5 = 1, P = argmaxP r (P, P), VII. (26b) 

Thus for all (R,E X ) pairs such that P x > E r (R): optimal time sharing constant is 1, optimal input distribution is 
the optimal input distribution for random coding exponent at rate R, we use maximum mutual information decoding 
and never declare erasures. Furthermore since a = 1 we have only a single phase in our scheme. 

E e (R,E x ) = E e (R,E x ,l,P r(R) ,U) = E r (R) VP > VP X > E r (R) (27) 

where P r fm satisfies E r (R,P r / R \) = E r (R) and II can be any control phase type. Evidently benefits of errors-and- 
erasures decoding is not observed in this region. 
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2) Low Erasure Exponent Region (i.e. E x < E r (R)): We observe and quantify the benefits of errors-and-erasures 
decoding for (P, E x ) pairs such that P x < E r (R). Since E r (R) is a non-negative non-increasing and convex function 
of R, we have 

a G [a* (E, E x ), 1] & E x < aE r {%) VP > VO < P x < E r (R) 

where a*(R,E x ) is the unique solution of the equation aE r (^) = E x . 

For the case E x = 0, however, aE r (^) = has multiple solutions and Theorem Q] holds but resulting error 
exponent, E e (R, 0, a, P, IT), does not correspond to the error exponent of a reliable sequence. Convention introduced 
below in equation (l28l) addresses both issues at once, by choosing the minimum of those solutions as a*(P, 0). In 
addition by this convention a*(R,E x ) is also continuous at E x = 0: lim^^o a*(R, E x ) = a*(R, 0). 

^ (".MS)] (28) 

I Rj L hi x = U 

where <? _1 (-) is the inverse of the function g(r) = E r ^ . 

As a result equations ([18]) and d28j, VP > and V0 < E x < E r (R) we have 

E e (R,E x ,a,P,n)=aE r (^,P)<E r (R) Va G [0, a*(P, P x )), VP, Vn (29a) 

E e (R,E x ,a,P,U) = E r (R) 5 = 1, P = arg maxP r (P, P), Vn. (29b) 

Thus for all (R,E X ) pairs such that P x < E r (R) optimal time sharing constant is in the interval [a*(R, E x ), 1]. 
For an (R, E x , a) triple such that R > 0, E x < E T (R) and a G [a* (P, P x ), 1] let P (P, P x , a) be 

P(P,P x ,a) = {P:aP r (f,P)>P x , l(P,^)>f}. (30) 

The constraint on mutual information is there to ensure that E e (R, 0, a, P, Il)'s are corresponding to error exponent 
of reliable sequences. The set V (P, P x , a) is convex because E r (R, P) and I (P, W) are concave in P. 
Note that VP > and VP X G (0, P r (P)j, 

P e (P,Px,a,P,n) =aP r (f,P) VaG [a*(P,P x ),l], VP G" V (P, P x , a) , VII (31a) 

P e (P,Px,a,P,n) >«P r (f) VaG [a*(P, P x ), 1], P = argmaxP r (f , P), VII. (31b) 

Thus as a result of (f3TT > we can restrict the optimization over P to V (P, P x , a) when VP > and VP X G (0, P r (P)]. 
For P x = case if we require the expression E e (R, 0, a, P, II) to correspond to the error exponent of a reliable 
sequence, get the restriction given in equation (fJT])- Thus using the definition of E e (R, E x ) given in (l42l we get: 

EJR,E X )= max max maxP e (P, E x , a, P, II) VP > VP X < EJR) (32) 

where q*(P, P x ), P (P, P x , a) and P e (P, P x , a, P, II) are given in equations (|28]>, (f30]> and dT8b - 

Unlike E e (R, E x ,a, P,H) itself, E e (R,E x ) as defined in (l32l corresponds to error exponent of reliable code 
sequences even at E x = 0. 

If maximizing P for the inner maximization in equation d32l is same for all a G [a*(P, P x ), 1], the optimal value 
of a is a*(R,E x ). In order to see that, we first observe that any fixed (R,E X ,P, II) such that E r (R,P) > E x , 
function E e (R } E x , a, P,H) is convex in a for all a G [a* (P, P x , P), 1] where a*(R, E x , P) is the unique solution 
of the equatioro aE r (^,P) = E x as it is shown Lemma [TOl in Appendix iBl Since the maximization preserves the 
convexity, maxn E e (R, E x , a, P, II) is also convex in a for all a G [a*(R, E x , P), 1]. Thus for any (P, E x , P) triple, 
maxn E e (R, E x , a, P, II), takes its maximum value either at the minimum possible value of a, i.e. a* (R, E x , P) = 
a*(R, E x ), or at the maximum possible value of a, i.e. 1. It is shown in Appendix |Cl maxn E e (R, E x ,a, P, II) takes 
its is maximum value at a = a*(R,E x ). 

l2 Evidently we need to make a minor modification for E x — case as before to ensure that we consider only the E e (R, E x , a, P, IT)'s that 
correspond to the reliable sequences: a* (R,0, P) = }7p^y^ ■ 
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Furthermore if the maximizing P is not only the same for all a G [a*(R, E x ),l] for a given (R, E x ) pair but also 
for all (R, E x ) pairs such that E x < E r (R) then we can find the optimal E e (R, E x ) by simply maximizing over ITs. 
In symmetric channels, for example, uniform distribution is the optimal distribution for all (R, E x ) pairs. Thus 

F (R PM — / E e (R,E x ,l,P*,U) if E x >E r (R,P*) \ 

u e[ n,u x) | maxiL E e (R,E x , a *(R,E x ),P*,U) if E x <E r (R,P*) J ^ 

where P* is the uniform distribution. 

F. Alternative Expression for Exponent: 

The minimization given in (TT8T ) for E e (R, E x , a, P, IT) is over transition probability matrices and control phase 
output types. In order to get a better grasp of the resulting expression, we simplify the analytical expression in this 
section. We do that by expressing the minimization in (fT8l ) in terms of solutions of lower dimensional optimization 
problems. 

Let ((R,P,Q) be the minimum Kullback-Leibler divergence under P with respect to W among the transition 
probability matrices whose mutual information under P is less than R and whose output distribution under P is Q. 
It is shown in Appendix |B1 that for a given P, ((R, P, Q) is convex in (P, Q) pair. Evidently for a given (P, Q) pair 
C(R, P, Q) is a non-increasing in R. Thus for a given (P, Q) pair £(Pi, Pi Q) is strictly decreasing on a closed interval 
and is an extended real valued function of the form: 

f oo R<Rj(P,Q) ] 

C ( i?5 P )Q ) = J mm vAPy ^ R D(V\\W\P) R € [R* (P, Q) , R* h (P, Q)\ I (34a) 

1 min y:( p^ = QD(y||W|P) P>P*(P,Q) J 

P? (P, Q) = min Py>>PW , I (P, V) (34b) 

y: (PV>=Q 

P^(P,Q) = min R J P : min mv) < R D (V\\ W\P) = mm V:(PV)Y=Q D (V\\ W\P) \ (34c) 
I V \PV) Y =Q J 

where PV 3> PW iff for all (x, y) pairs such that P(x)W(y\x) is zero, P(x)V(?/|x) is also zero. 

Let T (T, IT) be the minimum Kullback-Leibler divergence with respect to W r under n, among the LPs whose 
Kullback-Leibler divergence with respect to W a under II is less than or equal to T. 

r(T,n)= min D(U\\W r \U) (35) 

t/:D([/||W a |n)<T 

For a given II, F (T, H) is non-increasing and convex in T, thus F (T, II) is strictly decreasing in T on a closed 
interval. An equivalent expressions for F (T, II) and boundaries of this closed interval is derived in Appendix [A] 

oo if T <D(U \\W a \U) 

F(T,U) = { D(U s \\W r \U) if T =D(U s \\W a \U) for some s € [0, 1] } (36) 
D( C/i||W r |n) if T> D(E/i|| w |n) 



where 

U s (y\xi,x 2 ) 



l {W ( B |. a )>0 } W( u ) 

i<iv(,i. 1 )>o} W {y\x 2 ) 



if s = 
if fl€(0,l) 
if s = 1 



For a (P, P x , a, P, II) such that P x < aP r (f , P), using the definition of P e (P, P x , a, P, IF) in <dU> together with 
the equations ([131), d34j) and d36j) we get 

P e (P,Px,«,P,n) = min t<(f, P, Q) + \R X - R\ + + (1 - a)F f^, n 

i?i>i? 2 >0, T>0 

«C(-^ L ,-P,Q)+|R2-iJ| + +T<£; 3C 

For any (P, E x ,a,P, IT) above minimum is also achieved at a (Q, Ri, R2,T) such that Ri > R2 > R- In order to 
see this take any minimizing (Q* , R[, R 2 , T*), then there are three possibilities: 
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(a) R*>R%>R claim holds trivially. 

(b) R\> R > R2, since C(~ > Pi Q) is non-increasing function (Q* , R*, R, T*), is also minimizing, thus claim holds. 

(c) R > Rl > i?2> since ((^,P, Q) is non-increasing function (Q* , R, R,T*), is also minimizing, thus claim holds. 
Thus we obtain the following expression for E e (R, E x , a, P, IT), 

uE r (§,P) if E x >oE r {%,P) >i 

Q™%,R, a C(f,P,Q) + Ri-R+(l-a)T[ J L-,u) if E x <aE r ^,P) I (37) 

^ aC,(2±, P,Q)+R 2 -R+T<E^ ) 

Equation (|37T ) is simplified further for symmetric channels. For symmetric channels, 

E sp (R) = C(R,P*,Q*) = mm C(R,P*,Q) (38) 



EJlR,E x ,a,P,U) 



where P* is the uniform input distribution and Q* is the corresponding output distribution under W. 

Using alternative expression for E e (R, E x , a, P, IT) given in (|37T ) together with equations d33l and d38l ) for symmetric 
channels we get, 



max mm 

II R",R',T: 

R">R'>R T>0 
,R" 



E r (R) if E x > E r {R) 

a*E sp (f ) + R" - R+ (1 - a*)F (^n) if E x < E r (R) 



a'E sp (^r)+R' -R+T< E x 



E e (R, E x 



where a*(R,E x ) is given in equation (|28l ). 

Although (1381 ) does not hold in general using definition of C(.R, P, Q) an d E sp (R, P) we can assert that 

C(R, P, Q) > min ((R, P, Q) = E sp (R, P) 
Q 



(39) 



(40) 



Note that d40l can be used to bound the minimized expression in (|37T ) from below. In addition recall that if the set 
that a minimization is done over is enlarged resulting minimum can not increase. We can use (1371 ) also to enlarge 
the set that minimization is done over in (|40l . Thus we get an exponent E e (R, E Xl a, P, II) which is smaller than or 
equal to E e (R, E x , a, P, IT) in all channels and for all E e (R, E x , a, P, II)'s: 

aE r (§,P) if E x >aE r (%,P) ) 

aE sp (f , P) + R" - R + (1 - a)r fe, II) if £ x < a£ r (£, P) 



E e (R,E x ,a,P,U) 



mm 

R">R'>R T>0 
aE 3p (^-,P)+R'-R+T<E x 



(41) 



After an investigation very similar to the one we have already done for E e (R, E x , a, P, II) in Section ITlI-EI we obtain 
the below expression for the optimal error exponent for reliable sequences emerging from (|4~TT ): 



f E r {R) VR>0 V£ x > E r {R) 

E e (R,E x ) = l max max max EJR,E X , a, P,U) \/R > VE X <EJR) 

{ ae[a*(R,Ex),i\PeP(R,E x ,a) II 



(42) 



where a*(-R, E x ), V (R,E x ,a) and £ e (it!, i? x , a, P, II) are given in equations (|28T ), d30l and (l4TT >. respectively. 
G. Special Cases 

1) Zero Erasure Exponent Case, £ e (R,0): Using a simple repetition-at-erasures scheme, fixed length errors and 
erasures codes, can be converted into variable length block-codes, with the same error exponent. Thus the error 
exponents of variable length block-codes given by Burnashev in [311 is an upper bound to the error exponent of fixed 
length block-codes with erasures: 



£e(R,E x ) < (l-f)P 



VR > 0, E x > 



where V = max I)5 W(y\x) log ■ 
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We show below that, E e (R, 0) > (1 — jr)D. This implies that our coding scheme is optimal for E x = for all 
rates i.e. E e (R, 0) = £ e (R, 0) = V. 

Recall that for all R less than capacity a*(R, 0) = ^. Furthermore for any a > ^ 

V(R,0,a) = {P:\(P,W) > §} 

Thus for any (R, 0, a, P) such that P G V (R, 0, a), R" > R' > R, T > and aE sp (^-,P) + R' - R + T < 0, 
imply that i?' = R, R" = a\ (P, W), T = 0. Consequently 

E e (R,0,a,P,n)=a[E sp (^,P)+\(P,W)-^]+(l-a)D(W r \\W a \U) (43) 

When we maximize over II and PsP (R, 0, a) we get: 

EJR,0,a)= max a£ sp P) + al (P, W) - R + (1 - a)V Va€[#,ll. (44) 

P£P(R,0,a) L 

If simply insert the minimum possible value of a i.e. a*(R, 0) = ^: 

E e (i2,0,-5) = max §E sp (C, P) + §\ (P,W) - R+ (1 - §)V 

= (l-f)P. 

Thus E e (R,0) > (1 - f )2?. 

Indeed one need not to rely on the converse on variable length block-codes in order to establish the fact that 
E e (R, 0) = (1 — |r)P. The lower bound to probability of error presented in the next section, not only recovers this 
particular optimality result but also upper bounds the optimal error exponent, £ e (R,E x ), as a function of rate R and 
erasure exponents E x . 

2) Channels with non-zero Zero Error Capacity: For channels with a non-zero zero-error capacity, one can use 
equation (fT8T ) to prove that, for any E x < E r (R), E e (R, E x ) = oo. This implies that we can get error-free block-codes 
with this two phase coding scheme for any rate R < C and any erasure exponent E x < E r (R). As we discuss in 
Section [V] in more detail, this is the best erasure exponent for rates over the critical rate. 

IV. An Outer Bound for Error Exponent Erasure Exponent Trade-off 

In this section we derive an upper bound on £ e (R, E x ) using previously known results on erasure free block-codes 
with feedback and a generalization of the straight line bound of Shannon, Gallager and Berlekamp [29]. We first 
present a lower bound on the minimum error probability of block-codes with feedback and erasures, in terms of that 
of shorter codes in Section IIV-AI Then in Section IIV-BI we make a brief overview of the outer bounds on the error 
exponents of erasure free block-codes with feedback. Finally in Section ITV-CI we use the relation we have derived in 
Section HV-AI to tie the previously known results we have summarized in Section HV-B I to bound £ e (R,E x ). 

A. A Trait of Minimum Error Probability of block-codes with Erasures 

Shannon, Gallager and Berlekamp in |29l considered fixed length block-codes, with list decoding and established 
a family of lower bounds on the minimum error probability in terms of the product of minimum error probabilities 
of certain shorter codes. They have shown, |29l Theorem 1], that for fixed length block-codes with list decoding and 
without feedback 

V e {M,n,L) >P e (M,ni,Li)75 e (Li + l,n-ni,L) (45) 

where V e (M,n, L) denotes the minimum error probability of erasure free block codes of length n with M equally 
probable messages and with decoding list size L. As they have already pointed out in [29] this theorem continues to 
hold in the case when a feedback link is available from receiver to the transmitter; although TVs are different when 
feedback is available, the relation given in equation ( |43T ) still holds. They were interested in erasure free codes. We, 
on the other hand, are interested in block-codes which might have non-zero erasure probability. Accordingly we need 
to incorporate erasure probability as one of the parameters of the optimal error probability. This is what this section 
is dedicated to. 
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Decoded set M of a size L list decoder with erasures is either a subset of A4 whose size is at most L, like the 
erasure-free case, or a set which only includes the erasure symbol, i.e. either M C M. such that |M| < L or M = {x}. 
The minimum error probability of length n block-codes, with M equally probable messages, decoding list size L and 
erasure probability P x is given by V e (M, n, L, P x ). 

Theorem [2] below bounds the error probability of block codes with erasures and list decoding using the error 
probabilities of shorter codes with erasures and list decoding, like l29l Theorem 1] does in the erasure free case. Like 
its counter part in erasure free case Theorem [2] is later used to establish outer bounds to error exponents. 

Theorem 2: For any n, M, L, P x , ni < n, L%, and < s < 1 the minimum error probability of fixed length 
block-codes with feedback satisfy 

V e (M, n, L, P x ) > V e (M, n 1} Li, s)V e (Li + 1, n - m, L, v ^ M ^ P £ itS) ) (46) 

Let us first consider the following lemma which bounds the achievable error probability-erasure probability pairs 
for block-codes with nonuniform a priori probability distribution, in terms of block codes with a uniform a priori 
probability distribution but fewer messages. 

Lemma 2: For any length n block-code with message set M, a priori probability distribution (/?(•) on M., erasure 
probability P x , list decoding size L, and any integer K 

P e > n(tp,K)V e (K + \,n,L, , P V^ where U (tp, K) = min tp(S). (47) 

where V e (K + 1, n, L, P x ) is the minimum error probability of length n codes with (K+1) equally probable messages 
and decoding list size L, with feedback if the original code has feedback without feedback if the original code has 
not. 

Note that f2 (ip, K) is the minimum error probability of a size K decoder, if the posterior probability distribution on 
the messages is tp. 

Proof: If fi (tp, K) = theorem holds trivially. Thus we assume Q, (ip, K) > henceforth. For any size (K + 1) 
subset M! of M., we can use the encoding scheme and the decoding rule of the original code for M., to construct the 
following block-code for M.' . For each m G M! use the encoding scheme for message m in the original code, i.e. 

X;(m,y t - 1 )=X t (m,y t - 1 ) VmCM', t€[l,n], y^ 1 £ y^ 1 

For all y" G y n , if the original decoding rule declares erasure, decoding rule of the new code declares erasure, else 
the decoded list is the intersection of the original decoded list with M! . 

fx ifM=x 
[MnM' else 

Note that this is a length n code with (K + 1) messages and list decoding size L. Furthermore for all m G M 1 the 
conditional error and erasure probabilities, P x '(m), Pj (m) are equal to the conditional error and erasure probabilities 
in the original code, P x | m , P e \ m . Thus 

XTT E ( P *l«" P e|m) e + n ' L ) yM ' C M such that l^'l = K + 1 ( 48 ^ 

m£M' 

where ^(K + l,n,L) is the set of achievable error probability, erasure probability pairs for length n block-codes 
with (K + 1) equally probable messages and with decoding list size L. Evidently n, L) is a convex set for any 

(M, n,L) triple. Furthermore, Va > 1, V&i > 0, V6 2 > 0: 

iP e $ (a-^ + (h,b 2 )) G (49) 

Note that \P(M, n, L) is uniquely determined by V e (M, n, L, s x ) for s x G [0, 1] and vice verse: 

P e (M,n,L,Vx) = min Ve V(M,n,L,V x ). (50) 

<M</>e,<M6*(M,n,L) 

13 Note that if M C M then x g M because x ^ 
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Let the smallest non-zero element of {ip(l),(p(2), . . . <p(\M\)} be F° r an Y srze {K + 1) subset of A4 which 

includes £1 and all whose elements have non-zero probabilities, say Mi, we have, 

(P x ,P e ) = Y] .<P(m){P x \ m ,P e \ m ) 
* — 'm£.M 

= E cA/( ^( ra ) -V(^) 1 {me>! 1 }]( P x|m,^e| m )+V(6)y] c . . (P X |m, P e\m) 

As result of equation (|48T ) and the definition of ^(K + 1, n, L) we can conclude that 3?/>i G 'J/ (if + 1, n, L) such that 

(Px,P e ) = ]T cA ^ (1) (^)(P x | m ,P e | m ) + ^l)^l (51) 

where 95(^1) = (if + 1)v?(£l) and <^W(m) = y?(m) — ^(^i)l{ me _ A/(l }. Consequently 

+ E CA ^ (1) ( m ) = 1 ( 52 ) 

* — 'me.M 

Furthermore the number of non-zero (p^(m)'s is at least one less than that of non-zero ip(m)'s. The remaining 
probabilities, i^W(rn), have a minimum, 95^^ (^2) among its non-zero elements. We can repeat the same argument 
once more using that element and reduce the number of non-zero elements at least one more. After at most \M.\ — if 
such iterations we reach to a ip^> which is non-zero for if or fewer messages: 

(P x ,P e )=J^ y^ftWi+E cA ^ W (™)(Px| m ,Pe| m ) (53) 

where ipW(m) < ip(m) for all m in M. and Ylm&M -"-{</> w (m)>o} ^ Thus as a result of definition of $7 (if, if) 
given in equation (l47l ). 

n^,if)<y;* vU-^j). (54) 

Note that in equation (|53l , the first sum is equal to a convex combination of ipfs multiplied by Ylj=i the 
second sum is equal to a pair with non-negative entries. Using the convexity of ^(if + 1, n,L), the identity in (|49l ) 
and the equation (l54l we see that 

3-0 G #(if + 1, n, L) such that (P x , P e ) = Q (if, if) ip (55) 

The lemma follows equations (1551 and (I5D1) . ■ 
For proving Theorem |2j we express the error and erasure probabilities, as a convex combination of error and erasure 
probabilities of (n — ni) long block-codes with a priori probability distribution ip yni (m) = P[m\ y" 1 ] over the messages 
and apply Lemma [2] together with convexity arguments similar to the ones above. 

Proof [Theorem |2): 

For all m in M., let T(m) be the decoding region of m, T(x) be the decoding region of the erasure symbol x and 
T(m) the error region of m: 

T(m) = {y n : m G M} T(x) = {y n : x G M} T(ra) = T(m) c n T(x) c (56) 



Then for all m G M. we have 



(Px|m,Pe|m) = (P[T(x)|m],P T(m) 



(57) 



Note that 



Pxlm = V Pf/M 



= V P[2/ ni |m]y Ph/ n n +1 |m,y ni l 
Z^ym iy 1 J ^»; 1+1 :(yi,H; 1+1 )6T(x) Li/ni+11 ,y J 

Then the erasure probability is 

p * = E meA , A E ri W 1 ! m i E, :i+i:(ri ,, :i+i)eT(x) p [<+i 



m,y ni ] 



m, y" 1 



15 



Note that for every y ni , P x (y ni ) is the erasure probability of a code of length (n — ni) with a priori probability 
distribution is {p y ^ 1 {m) = P[m| y" 1 }. Furthermore we can write the error probability, P e as 

P e = Y Pfy ni l ( V Pfml 2/ ni l V - P \Vn +i I m, y ni ] ) 

Z^yn, Lff J I JZ ^y: i+1 :(^ >? /: i + 1 )eT(m) ^n 1+ l| J I 

= £ Bi P[» ni ]P.(y n 

where P e (j/ ni ) is the error probability of the very same length (n — ni) code. As a result of Lemma [2] we know that 
the pair (P e (y ni ), P x (j/ ni )) satisfies 

P e (y"i) > n(yyi,£i)P e (^i + l,(n - <m),L, ^g^j ) (58) 

Then for any s G [0,1]. 

P - ^ E, ni P ^ ni ] ( 2 " s ^ n ^ ( Ll + li (n - ni), L, 

> (E^ P ^ ( 1 " •>« ■ *>) * + 1, (n - nx), 

E,. P ^ (1 " ')« • ^)) ^ (*i + 1, (" - nx), L, ^^%^ ) (59) 
where the second inequality follows from the convexity of V e (M, n,L, P x ) in P x . 

Now consider a code which uses the first ni time units of the original encoding scheme as its encoding scheme. 
Decoder of this new code draws a real number from [0, 1] uniformly at random, independently of Y" 1 (and the 
message evidently). If this number is less than s it declares erasure else it makes a maximum likelihood decoding 
with list of size L\. Then the sum on the left hand side of the below expression (l60l is its error probability. But that 
probability is lower bounded by V e (M, ni, L\, s) which is minimum error probability over all length ni 
block-codes with M messages and list size L\, i.e. 

V P[y^](l-s)n( l p y n 1 ,L 1 )>T e (M,n 1 ,L 1 ,s). (60) 

Then the theorem follows from the equations (l59l and (l60l and the fact that P e (M, n, L\, P x ) is decreasing function 
of P x . 

QED 

Like the result of Shannon, Gallager and Berlekamp in |29l Theorem 1], Theorem[2]is correct both with and without 
feedback. Although P e 's are different in each case, the relationship between them given in equation (|46l ) holds in 
both cases. 

B. Classic Results on Error Exponent of Erasure-free block-code with Feedback: 

In this section we give a very brief overview of the previously known result on the error probability of erasure 
free block-codes with feedback. These result are used in Section HV-CI together with Theorem [2] to bound £ e (R,E x ) 
from above. Note that Theorem [2] only relates the error probability of longer codes to that of the shorter ones. It does 
not in and of itself bound the error probability. It is in a sense a tool to glue together various bounds on the error 
probability. 

First bound we consider is on the error exponent of erasure free block-codes with feedback. Haroutunian proved 
in ifTBI that, for any (M n , n,L n ) sequence of triples, such that lim^oo lnMn ~ lnL " = R, 

lim -'^(A/ n ,n,L n ,0) < £Wm (61) 
n— >oo n 

where 

E H (R) = min max D (V\\ W\P) and C(V) = max I (P, V) . (62) 

V:C(V)<R P P 
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Second bound we consider is on the tradeoff between the error exponents of two messages in a two message erasure 
free block-code with feedback. Berlekamp mentions this result while passing in (T]| and attributes it to Gallager and 
Shannon. 

Lemma 3: For any feedback encoding scheme with two messages and erasure free decision rule and T > Tq: 

either P el > i e -nT+v^4inP roi „ or Pe2 > i e -nr ( T )+v ^4inP mi „_ (63) 

where P min = mm Xjy . w ^ x )W(y\x) 

To^max^-mJ] W(y\x) (64) 

T (T) = maxn V (T, U) . (65) 

Result is old and somewhat intuitive to those who are familiar with the calculations in the non-feedback case. Thus 
probably it has been proven a number of times. But we are not aware of a published proof, hence we have included 
one in Appendix lAl 

Although Lemma [3] establishes only the converse part (T, F (T)) is indeed the optimal tradeoff for the error exponent 
of two messages in an erasure free block-code, both with and without feedback. Achievability of this tradeoff has 
already been established in ||29l Theorem 5] for the case without feedback; evidently this implies the achievability 
with feedback. Furthermore To does have an operational meaning, it is the maximum error exponent first message 
can have, while the second message has zero error probability. This fact is also proved in Appendix |A] 

For some channels Lemma [3] gives us a bound on the error exponent of erasure free-codes at zero rate, which is 
tighter than Haroutunian's bound at zero rate. In order to see this let us first define T* to be 

T* = maxmin{T,r(T)}. 

Note that T* is finite iff W(y\x)W(y\x) > for all x, x pairs. Recall that this is also the necessary and sufficient 
condition of zero-error capacity, Co, to be zero. Eh(R) on the other hand is infinite for all R < R^ like E sp (R) 
where i?^ is given by, 

Roo = — minp/ max,. In > P(x) 
00 f() y Z-^ : w(i/|aO>0 V ' 

Even in the cases where Eh(0) is finite, Eh(0) > T*. We can use this fact, Lemma |3j and Theorem |2l or ||29l 
Theorem 1] for that matter, to strengthen Haroutunian bound at low rates, as follows. 

Lemma 4: For all channels with zero zero-error capacity, Cq = and any sequence of M n , such that Hindoo ln ^ f " = 

lim -^(M„,n,i,o) <E H {R) (66) 

where 

E H (R) if R>Rm 1 



E H {R) 



T* + Eh(R ^ T ' R if Re[Q,R ht ) 



and Rht is the unique solution of the equation T* = Eh{R) — RE' H (R) if it exists C otherwise. 
Before going into the proof let us note that Eh(R) is obtained simply by drawing the tangent line to the curve 
(R, Eu{R)) from the point (0,T*). The curve (R, Ejj(R)) is same as the tangent line, for the rates between and 
Rht, and it is same as the curve (R, Eh(R)) from then on where R^t is the rate of the point at which the tangent 
from (0,T*) meets the curve (R,E H (R)). 

Proof: For R > R^t this Lemma immediately follows from Haroutunian's result in |[T6l for L\ = 1. If R < Rh t 
then we apply Theorem |2] 

V e (M, n, La, P x ) > V e (M, n,L x , s)V e (h x + 1, n - n,X, v fc'^' \ (67) 



"P e (A/,n,Li,s) / 

witl0 s = 0, P x = 0, L\ = 1 and Pi = L"^"]- on tne other hand as a result of Lemma [3] and definition of T* we 

have, 

7> e (2,n - n,L,0) > e-w'+^r^ (68) 

14 Or (29] Theorem 1] with L x = 1 and m = I ^-J. 
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Using equations (1671) and (1681 ) we get, 

-lnP e (M,n,l,0) < -lnP e (M,n,l,i[i /; 



Rht 



+ 



tint n 



T* + 



tint 



where ^M^- = R ht . Lemma follows by simply applying Haroutunian's result to the first terms on the right hand side. 



C. Generalized Straight Line Bound for Error-Erasure Exponents 

Theorem [2] bounds the minimum error probability length n block-codes from below in terms of the minimum error 
probability of length ni and length (n — ni) block-codes. The rate and erasure probability of the longer code constraints 
the rates and erasure probabilities of the shorter ones, but does not specify them completely. We use this fact together 
with the improved Haroutunian's bound on the error exponents of erasure free block-codes with feedback, i.e. Lemma 
IU and the error exponent tradeoff of the erasure free feedback block-codes with two messages, i.e. Lemma [3l to 
obtain a family of upper bounds on the error exponents of feedback block-codes with erasure. 

Theorem 3: For any DMC with C = rate R G [0,C] and E x G [0,E H {R)] and for any r G [r h {R,E x ),C] 



£ e (R,E x )<&E H (r) + (l-f)T 



R\ 



1-- 



where rh(R, E x ), is the unique solution of REn(r) — rE x = 0. 

Theorem [3] simply states that any line connecting any two points of the curves (R,E Xl E e ) = (R, Eh(R), Eh(R)) 
and (R,E x ,E e ) = (0, E X ,T (E x )) lays above the surface (R, E x ,E e ) = (R, E x ,£ e (R, E x )). The condition C = 
is not merely a technical condition due to the proof technique; as we will see in Section |V] for channels with Co >, 
there are zero-error codes with erasure exponent as high as E sp (R) for any rate R < C. 

Proof: Let us consider Theorem [2l for s = 0, L = 1, L% = 1, take the logarithm of both sides of equation (l46l ) 
and divide by n, 



-lnV s (M,n,l,P x ) < /nj_\ - lnP«,(M,ni,:t,0) , _ nj_ 
n — \ n/ ni V n 

Let us assume for the moment that, 

•PoC2.n.l./' x i ~ n] 
where P m %n is the minimum non-zero entry of W. 



\nV B 2,n-m,l. 



V B (M, ni,l,0) / 



In P x In 16 , In P„ 
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(69) 



(70) 



If we set M = [e nR \ , P x 
to infinity we get 



-nBL 



> n i = lr n J m *^9l . use the bound given in d70l and take the limit as n goes 



£ e (R,E x ) < f£ e (r) + (1 - f)r 



r-R 



Then Theorem [3] follows from Lemma @] and the fact that T (T) is nondecreasing function of T. 
In order to finish the proof we need to prove d70l . Note that if 



lnP x 



In 16 
n 



then the claim holds trivially, because T (T) = oo for T < Tq. For the case 

-lnP x 



111 16 I In P m in \ T 







We prove the claim by contradiction. Let us assume that what we have claimed is wrong. Then there exists a 
T < T < T* such that 

-nr(T) + V:inP mjn -nT+VnlnP . 



V e (M,n,L,P x ) < 



16 



and 



Then there exists a block-code with erasures that satisfies 

-nr(T) + Vnln P mirL 



P 
P 



T(m) 
f (m) 



in 



< 2^ 

< 2^ 



16 

nr(T) + Vn In P„ 



16 



P[T(x)| to] < T- 
P[T(x)| fii) < 2 l - 



16 



16 
16 
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Let us enlarge the decoding region of m by taking its union with the erasure region: 

T'(m) = T(m)uT(x) T'(m) = T(m) T'(x) = 
The resulting code is an erasure free code with 

P[r'(fh)\m) < 2 ^ r(T, \^ P — and P[T'(m)|ft] < 2 e ' nr(r)+ ^ lnP """ +2 e-^°^ - 

Since T < T < T* , F (T) > T, this contradicts with Lemma |3] thus equation ([70]) holds. ■ 
Note that we have set L\ = 1 in the proof but we could have set it to any subexponential function of block length 
which diverges as n diverges. By doing so we would have replaced T (T) with £ e (0,E x ), while keeping the term 
including Eh{R) the same. Since the best known upper bound for £ e (0, E x ) is T (-Ex) for E x < T* final result is 
same for case with feedback^ On the other hand for the case without feedback, which is not the main focus of 
this paper, this does make a difference. By choosing L\ to be a subexponential function of block length one can 
use Telatar's converse result l30l Theorem 4.4] on the error exponent at zero rate and zero erasure exponent without 
feedback. 

In Figure [Q the upper and lower bounds we have derived for error exponent are plotted as a function of erasure 
exponent for a binary symmetric channel with cross over probability e = 0.25 at rate R = 8.62 x 10 -2 nats per 
channel use. Solid lines are lower bounds to the error exponent for block-codes with feedback, which have established 
in Section JIIJ and without feedback, which was previously established previously, iTOl . (9), 11301 . Dashed lines are 
the upper bounds obtained using Theorem [3] Note that all four curves meet at a point on bottom right, this is the 
point that corresponds to the error exponent of block-codes at rate R = 8.62 x 10~ 2 nats per channel use and its 
values are the same with and without feedback since we are on a symmetric channel and our rate is over the critical 
rate. Any point to the lower right of this point is achievable both with and without feedback. 

Error Exponent vs Erasure Exponent at R=0.0862 nats for a BSC with £=0.25 
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Erasure Exponent 
Fig. 1. Error Exponent vs Erasure Exponent 



15 In binary symmetric channels these result can be strengthened using the value of £(0), |34|. However those changes will improve the upper 
bound on error exponent only at low rates and high erasure exponents. 
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V. Erasure Exponent of Error-Free Codes :£ x (R) 

For all DMC's which has one or more zero probability transitions, for all rates below capacity, R < C and for 
small enough E x s, E e (R,E x ) = oo. For such (R,E X ) pairs, coding scheme we have described in Section HTT1 gives 
us an error free code. The connection between the erasure exponent of error free block-codes, and error exponent of 
block-codes with erasures is not confined to this particular encoding scheme. In order to explain those connections 
in more detail let us first define the error-free codes more formally. 

Definition 3: A sequences Qq of block-codes with feedback is an error-free reliable sequence iff 

P e (n) =0 Vn, and limsup^Px^ + ^) = 0. 

The highest rate achievable for error-free reliable codes is the zero-error capacity with feedback and erasures, C x $. 

If all the transition probabilities are positive i.e. mm X:y W(y\x) = 5 > 0, then P[y n | m] > 5 n for all m G M. 
and y n £ y n . Consequently C X) q is zero. On the other hand as an immediate consequence of the encoding scheme 
suggested by Yamamoto and Itoh in ll32l . if there is one or more zero probability transitions, C x ,a is equal to channel 
capacity C. 

Definition 4: For all DMC's with at least one (x, y) pair such that W(y\x) = 0, Vi? < C erasure exponent of error 
free block-codes with feedback is defined as 

£ X (R)= sup E X (Q ). (71) 

Qo-R{Qo)>R 

For any erasure exponent, E x less than £ X (R), there is an error-free reliable sequence, i.e. there is a reliable sequence 
with infinite error exponent: 

E x <£ x (R)^£ e (R,E x ) = ^. 
More interestingly if E x > £ X (R) then £ e (R, E x ) < oo. In order to see this let 5 be the minimum non-zero transition 



probability. Note that if P[y n | m] ^ then P[y n \ m] > 5 n . Thus if P M^M 



-lnP.W 



^ 0, then P 



> 5 n e~ nR , i.e. 



< R — In 5. However if E x > £ X {R) then there is no error free reliable sequence at rate R with erasure 
exponent E x . Thus P e ^ > for infinitely many n in any reliable sequence and error exponent of all of those codes 
are bounded above by a finite number. Consequently, 

E x >£ x (R)^£ e (R,E x )<^. 

In a sense like the error exponent of erasure free block-codes, £(R), erasure exponent of the error free bock codes, 
£ X (R), gives a partial description of £(R, E x ). £{R) gives the value of error exponents below which erasure exponent 
can be pushed to infinity and £ X (R) gives the value of erasure exponent below which error exponent can be pushed 
to infinity. 

Below the erasure exponent of zero-error codes, £ X (R), is investigated separately for two families of channels: 
Channels which have a positive zero error capacity, i.e. Co > and Channels which have zero zero-error capacity, 
i.e. C = 0. 

A. Case 1: Co > 

Theorem 4: For a DMC if C > then, 

E H (R) > £ X (R) > E sp (R). 

Proof: If zero-error capacity is strictly greater then zero, i.e. Co > 0, then one can achieve the sphere packing 
exponent, with zero error probability using a two phase scheme. In the first phase transmitter uses a length ni = [e" l/? ] 
block-code without feedback with a list decoder of size L = \-J^E sp (R, PJj,)] where P% is the input distribution 
satisfying E sp (R) = E sp (R, P^). Note that with this list size is sphere packing exponeno is achievable at rate R. 
Thus correct message is in the list with at least probability (1 — e ~" lEsp ( Ii ' ) ), see (9j Page 196]. In the second phase 

16 Indeed this upper bound on error probability is tight exponentially for block-codes without feedback. 
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transmitter uses a zero error code, of length^ ri2 = |~ ln ^ o +1 ^ ] with L + 1 messages, to tell the receiver whether the 
correct message is in that list or not, and the correct message itself if it is in the list. Clearly such a feedback code 
with two phases is error free, and it has erasures only when there exists an error in the first phase. Thus the erasure 
probability of the over all code is upper bounded by e ~ n i E °p( R )_ Note that n 2 is fixed for a given R. Consequently 
as the length of the first phase, n x , diverges, the rate and erasure exponent of (n x + n 2 ) long block-code converge to 
the rate and error exponent of ni long code of the first phase, i.e. to R and E sp {R). Thus 

£*(R) > E sp (R). 

Any error free block-code with erasures can be forced to decode, at erasures. The resulting fixed length code has 
an error probability no larger than the erasure probability of the original code. However we know that, lfl6l . error 
probability of the erasure free block-codes with feedback decreases with an exponent no larger than Eh{R)- Thus, 

£ X (R) < e h(R). 

This upper bound on the erasure exponent also follows from the converse result we present in the next section, 
Theorem [6] ■ 
For symmetric channels Ejj(R) = E sp (R) and Theorem |4] determines the erasure exponent of error-free codes on 
symmetric channels with non-zero zero-error-capacity completely. 

B. Case 2: Cq = 

This case is more involved than the previous one. We first establish an upper bound on £ X {R) in terms of the 
improved version of Haroutunian's bound, i.e. Lemma |U and the erasure exponent of error-free codes at zero rate, 
£ x (0). Then we show that £ x (0) is equal to the erasure exponent error-free block-codes with two messages, £ x p, and 
bound £ x> 2 from below. 

For any M, n and L, V e (M, n,L,P x ) = for large enough P x . We denote the minimum of such P x 's by 
Vo jX (M, n,L). Thus we can write £" x 2 as 

£ x> 2 = liminfPo.x (2,n,l) . 

n— >oo 

Theorem 5: For any n, M, L, n x < n and L\, minimum erasure probability of fixed length error-free block-codes 
with feedback, Vo, x (M,n, L), satisfies 

V , X (M, n, L) > V e {M, m, L x , 0)V , X {L x + 1, n - m, L) . (72) 

Like Theorem [2j Theorem [5] is correct both with and without feedback. Although Vo :X s and V e will be different in 
each case, the relationship between them given in equation (1721 holds in both cases. 

Proof: If V e (M, ni, L\, 0) = theorem holds trivially. Thus we assume henceforth that V e (M, ni, L\, 0) > 0. 
Using Theorem |2] with P x = Vo tX (M, n, L) we get 

V e (M, n, L, V , X (M, n, L)) > V e (M, m,L 1} 0)P e [l x + 1, (n - n%),L, ^Im^mI) ) " 
Since V e (M, n, L,V 0>X (M, n, L)) = and V e (M, ni, L 1} 0) > we have, 

P e (L 1 + l,(n-nO,L, P X,M) ) =0 ' 

Thus 

^^k>V,,ALi + Un-n x ),L). 

■ 

As we have done in the errors and erasures case we can convert this into a bound on exponents. If we use the 
improved version of Haroutunian's bound, i.e. Lemma 01 as an upper bound on the error exponent of erasure free 
block-codes we get the following. 

17 For some DMCs with Co > and for some L one may need more than |" ln (^+ 1 ) "| tj me units to convey one of the (L + 1) messages 
without any errors, because Co itself is defined as a limit. But even in those cases we are guaranteed to have a fixed amount of time for that 
transmissions, which does not change with m. Thus above argument holds as is even in those cases. 
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Theorem 6: For any rate R > for any a £ [|r, l] 

-fx (R) < c^H (|) + (1 - a)£ x (0) 

Now let us focus on the value of erasure exponent at zero rate: 

Lemma 5: For the channels which has zero zero-error capacity, i.e. Co = 0, erasure exponent of error free block- 
codes at zero rate £ x (0) is equal to the erasure exponent of error free block-codes with two messages £ x2 . 
Note that unlike the two message case, £ Xj 2, in the zero rate case £ x (0) the number of messages are increasing with 
block length to infinity, thus we can not claim their equality just as a result of their definitions. 
Proof: If we write Theorem [5] for L = 1, ni = and L\ = 1 

7>o,x(M, n, 1) > V e (M, 0, l)P ,x(2, n, 1) 

= ^±7> ,x(2,n,l) VM,n 

Thus as an immediate result of the definitions of £ x (0) and £ Xj 2, we have £ x (0) < £ X) 2. 

In order to prove the equality one needs to prove £ x (0) > £ Xj 2- For doing that let us assume that it is possible to 
send one bit with erasure probability e with block-code of length £(e): 

e>7>o,*(V(e),l) (73) 

One can use this code to send r bits, by repeating each bit whenever there exists an erasure. If the block length is 
n = k£(e) then a message erasure occurs only when the number of bit erasures in k trials is more then k — r. Let #e 
denote the number of erasures out of k trials then 

= I] = j^il - e) k ~ l e l and P x = J^fc-r+i P[#e = l] ' 

Thus 

E k fc! (L\ l (i -[lln^+(k-l)lni^] 

l=k _ r+1 V.(k-l)\ \k) K 1 k) e 

- Z^ l=k - r+1 v.(k-iy. Vk) \ l k) e ykUJ - 



Then for any e < 1 — £, we have 



P x < e - fcD ( 1 " 



Evidently P x > P ,x(2 r , n, 1) for n = kl{e). Thus, 

-lnPo,«(2Mi,l) ^ Klzilif) 

Then for any sequence of (r, A;)'s such that lim^oo -| = we have £ x (0) > • Thus any exponent achievable for 
two message case is achievable for zero rate case: £ x (0) > £ Xj 2- ■ 
As a result of Lemma [6] which is presented in the next section we know that 

7>o,x(2,n,l) > ±( sup p{s)) n where (3(s) = mm s , 2 V W{y\x)^W{y\~x) s . 

»6(0,.5) ^ 

Thus as a result of Lemma [5] we have 

£x(0) = £ x ,2<-m sup P{s). 

se(o,.5) 
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C. Lower Bounds on 7 ? o,x(2, n, 1) 

Suppose at time t the correct message, M, is assigned to the input letter x and the other message is assigned to the in- 
put letter x, then the receiver can not to rule out the incorrect message at time t with probability (Y2yW(y\x)>o WW(y\x). 
Using this fact one can prove that, 



7>o,x(2, n, 1) > ( min x>i V W(y\x)) . (74) 



Now let us consider channels whose transition probability matrix W is of the form 

W 



1-q q 
1 



Let us denote the output symbol reachable from both of the input letters by y. If Y n is a sequence of y's then the 
receiver can not decode without errors, i.e. it has to declare and erasure. Thus 

P ,x(2, n, 1) > ±(P[Y n = yy . . . y\ M = 1] + P[Y n = yy . . . y\ M = 2]) 

(«) , 

> VP[Y n = yy...y\M = l] P[Y" = yy . . . y\ M = 2] 

(b) „ 
> 

where (a) hods because arithmetic mean is larger than the geometric mean, and (6) holds because 

P[Y t = y| M = 1, Y <_1 ] P[Y t = y| M = 2, Y^ 1 ] > g Vi 

Indeed this is bound is tight. If the encoder assigns first message to the input letter that always leads to y and the 
second message to the other input letter in first [§J time instances, and does the flipped assignment in the last [5] 

I — I 

time instances, then an erasure happens with a probability less than q l 2 J . 

Note that equation (1741 bounds Vq, x (2, n, 1) only by q", rather than q^K Using the insight from this example one 
can establish the following lower bound, 

Po,x(2, n, 1) > I (min, i£ ^ ^/W(y\x)W{y\~x)) " . (75) 

However the bound given in equation d75l ) is decaying exponentially in n, even when all entries of the W are positive, 
i.e. even when Po.x^, n, 1) = 1. In other words it is not superior to the bound given in equation d74l . Following 
bound implies bounds given in equations d74l ) and d75l ). Furthermore for certain channels it is strictly better than both. 

Lemma 6: Erasure probability of a error free codes two messages is lower bounded as 

7>o,x(2,n,l) > ±( sup p(s)) n where (3(s) = min s , 5 V W {y\x)^W (y\x) s (76) 
s e(o,.5) 

Note that the bound in equation (1741 is implied by iim s _ 5>0 + /3(s) case, and bound in equation d751 ) is implied by 

lim s ^o.5- Pis). 

Although J2 y W(y\x) s W(y\x) 1 ^ s is convex in s on (0, 0.5) for all (x, x) pairs, P(s) is not convex in s because of the 
minimization in its definition. Thus the supremum over s does not necessarily occur on the boundaries. Indeed there 
are channels for which bound given in Lemma [6] is strictly better than the bounds given in ( P74l and ( P75l . Following 
is the transition probabilities of such one such channel. 



W 

P(Q) = 0.7, P(.5) = 0.7027, 0(0.18) = 0.7299. 



0.1600 0.0200 0.2200 0.3000 0.3000 
0.0900 0.4000 0.2700 0.0002 0.2398 
0.1800 0.2000 0.3000 0.3200 
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Proof: For any error free code and for any s G (0, 0.5) 



j/":P[y"|M=l]P[y"|M=2]>0 
•;/":P[r|M=l]P[j / "|M=2]>0 

>5 J] P[y n | M = l] 1_s P[y n | M = 2) s 

y":FV|M=l]IV|M=2]>0 

= \ E P[^ 1 |M = l] 1 ^P[ 2/ n - 1 |M = 2] £ P[y n |M = l,y n - 1 ] 1 - s P[y n |M = 2, ?/ n - 1 ] 

?/"- 1 :P[^- 1 |M=l]P[?/"- 1 |M=2]>0 2/„:P[y"|M=l]% n |M=2]>0 

= 1 E P[?/ n - 1 |M = l] 1 - s P[ ?/ n - 1 |M = 2]/3( S ) 

yn-l.p[j / n-l| M = 1 ]p[ 2/ „-l| M=2 ] >0 

= \{mr 

Lemma follows by taking the supremum over s 6 (0, 0.5). ■ 

VI. Discussion 

In the erasure-free case, the error exponent is not known for a general DMC. We do not even know if it is still 
upper bounded by sphere packing exponent for non-symmetric DMCs. However for the case with erasures, at zero 
erasure exponent, the value of error exponent been known for long, |3], [32]. Our main aim was establishing upper 
and lower bounds that will extend the bounds at the zero erasure exponent case gracefully and non-trivially to the 
positive exponents. Our results are best understood in this framework and should be interpreted accordingly. 

We derived inner bounds using a two phase encoding schemes, which are known to be optimal at zero-erasure 
exponent case. We have improved their performance at positive erasure exponent values by choosing relative durations 
of the phases properly and by using an appropriate decoder. However within each phase the assignment of messages 
to input letters is fixed. In a general feedback encoder, on the other hand, assignment of the messages to input symbols 
at each time can depend on the previous channel outputs and such encoding schemes have proven to improve the error 
exponent at low rates, l33l . lfl2l . ||6l, 11231 for some DMCs. Using such an encoding in the communication phase will 
improve the performance at low rates. In addition instead of committing to a fixed duration for the communication 
phase one might consider using a stopping time to switch from communication phase to the control phase. However 
in order to apply those ideas effectively for a general DMC, it seems one first needs to solve the problem for the 
erasure-free block-codes for a general DMC. 

We derived the outer bounds without making any assumption about the feedback encoding scheme. Thus they are 
valid for any fixed length block-code with feedback and erasures. The principal idea of the straight line bound is 
making use of the bounds derived for different rate and erasure exponent points by taking their convex combinations. 
This approach can be interpreted as a generalization of the outer bounds used for variable length block-codes, O, 
|2). As it was the case for the inner bounds, it seems in order to improve the outer bounds one needs establish outer 
bounds on some related problem, i.e. on the error exponents of erasure free block-codes with feedback and on the 
error exponent erasure exponent trade of at zero rate. 
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Appendix 

A. The Error Exponent Trade-off for Feedback Encoding Schemes with Two Message and Erasure Free Decoders : 
Lemma 7: T (T, II) defined in equation (l33T > is also equal to 

oo if T <D(U \\W a \U) 

D(U s \\W r \U) if T = D ( U s \\ W a \U) for some s G [0, 1] 
D(C/i||W P jn) if T> D(E/i|| w a \n) 



r(T,n) 



where 



W(y\x) 



U s (y\x,x) 



T,y [W(ili)>0 W(y\x) 
W(y\xy-°W(y\xy 

)>2- y w{y\ x y->w{y\zy 

l{W(y| 1 )>0} W(»,\Z,\ 




Proof: 



T(T,U)= min D(*7||WJII) 

£/:D(E/||w a |n)<T 

= min sup D (U\\ W r \U) + A(D (U\\ W a \IL) 

u A>0 



(a) 



sup mm D {U\\W r \U) + X(D (U\\W a \U) 

A>0 U 



T) 
-T) 



sup min— AT + (1 + A) N n(x, x)U(y\x, x) In 



(&) 



A>0 

sup - 

A>0 



-AT - (1 + A)^ r _n(2;,i)ln^l^(y|x 



VK(y|x) T T^W(i;|5)TTT 
i 



(77) 



where (a) follows from convexity of D (C7|| W r |IT) + A(D (U\\ W a |n) — T) in U and linearity (concavity) of it in A; 
(6) holds because minimizing U is U s for s = The function on the right hand side of (1771 ) is maximized at a 



positive and finite A iff there is a A such that D ( U_i 



W a \Ii) = T. Thus by substituting A 



l-s 



we get 



oo 


if 


T < lim s ^ + D ( U s \ 


Wa|n) 


lim s _K)+ D ( U s \\ W r \Tl) 


if 


T = lim s _> + D ( U s \ 


w a |n) 


D(U s \\W r \U) 


if 


T=D{U s \\W a \U) 


for some s G (0, 1) > 


lm a _, 1 -D(E/ a ||W r |II) 


if 


T = lim s ^ 1 -D(U s \ 


w a |n) 


lim s ^ 1 -D(?7 s ||W r |n) 


if 


T>]xm s ^ 1 -D(U s \ 


w a |n) 



r(T,n) = <^ D(U s \\W r \U) if T = D(U S \\ W a \IL) for some s G (0, 1) } (78) 



Lemma follows from the definition U s at s = 0, 1 and equation (r78l ). ■ 
Now we are ready to present the proof of Lemma [3] 

Proof [Lemma [3): 

Our proof is very much like the one for the converse part of [29, Theorem 5], except few modifications that allow 
us to handle the fact that encoding schemes we are considering are feedback encoding schemes. Like [29, Theorem 
5] we construct a probability measure Pt [■] on y n as a function of T and the encoding scheme. Then we bound the 
error probability of each message from below using the probability of the decoding region of the other message 
under Pt [■]■ 

For any T > T and n, let S T ,u be 

St,u = 



if T< D(U \\W a \U) 
s if Els G [0, l]s.t. D ( U s \\ W a |n) = T 

1 if T>D(Ui\\W a \U) 



(79) 



Recall that 

T 



max - 

x,x 



In 



y.W(y\x)>0 



W{y\x) and D(U Q \\W a \Tl) 



^-^x,x 



_ n(x, x) In 



y.W(y\x)>0 



W(y\x) 
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Thus 

To > D(U \\ Wa|n) (80) 
Thus as a result of definition of St,u and equation (l80l we have 

D (Us T , n \\W a \IL) <T VT>T . (81) 
Using Lemma |7J definition of Sr,n an d equation (l80l) we can also conclude that 

d ( ^ || w r \n) = r (r, n) < r (T) vt > r . (82) 

Note that for a feedback encoding schemes with two messages at time t, X t (-) : {mi, ^2} x y , given the the 
past channel outputs, y* _1 channel inputs for each message (X t (mi, y t_1 ) and X t (m2, y* -1 )) are fixed. Thus there 
is a corresponding n 

n/^fO if (x,x)/(X t (m 1 , 2 /'- 1 ),Xi(m 2 ,y*- 1 )) 1 
l ' if (x,x) = (X t (m 1 ,y'- 1 ),Xi(m 2 ,y*- 1 )) J' ^ 

Then for any T >T let P r [y t \ y'" 1 ] be 

[iftl y'- 1 } = Us^MXtimuy^^Xt^y 1 - 1 )). (84) 

Note that as a result of equation ( f8TT > and equation ( f82l we have 

E yt p r [ml y'- 1 ] ^ PtStelS^ ^ T and ^ M ln p^feK^ ^ r ( T ) 

Now we make a standard measure change argument, 
P[y n | M = mi] = e p T\y"] P T [y n ] 

= e ^ t=1 p^|M=m 1 ,»t-iJ p T r^nn 

< e - nT e E "=i ZmMv*- 1 )^ [ y "\ (85) 

where 

^i(»iy*- i )=E^[feif t " 1 ] k^m^^ ^ 

yt 

Following a very similar reasoning we get, 

P[y n | M = m 2 }> e ~ nT ^e^ z ^\ yt ~^P T [y n ] (87) 

where 

z t Mv^) = Y J PT[m\v t - 1 ] PuSaiC^j ~ ln q^lC^j ) (88) 

Note that for m = {mi, ^2} and t £ {1,2,..., n}, 

£ [iftl y^ 1 ] ^.mC^ll/*- 1 ) = Vy*- 1 E (89a) 

yt 

(Zt^ytly 1 - 1 )) 2 <4(lnP min ) 2 Vy* € ^ (8%) 

£p T [y t |2/ t - 1 ]^, m (y t |^ 1 )^-fc, m (y i |y i " 1 " fc ) = o Vy'" 1 e y* -1 VMM, . . . ,t - 1} (89c) 

yt 



26 



Thus as a result of equation d89l ), for all m = {mi, ma} 

n 

J2PT[y n }J2 z ^(m\y t ' 1 ) = o 



y" 



t=l 



P t [2/1 E Z t,m(yt\y t ~ 1 ) < 4n(lnP min ) 5 



(90a) 
(90b) 



For m = mi,m2 let iJ m be 



Using equation (l90b and Chebychev's inequality we conclude that, 



Pt [Z m ] > 3/4 m = m 1; m 2 =► P T [Z m n Z m2 ] > 1/2 

Thus either the total probability of intersection of Z mi n Z m2 with the decoding region of second message is equal 
to or larger than 1/4 or the total probability of intersection of Z mi n Z m2 with the decoding region of first message 
is strictly larger than 1/4. Then the lemma follow from equations (I85T ) and (I87T ). 

QED 

As we have noted previously To does have an operational meaning it is the maximum error exponent first message 
can have, when the error probability of the second message is zero. 

Lemma 8: For any feedback encoding scheme with two messages, if P ertl2 

-nT 



then P emi > e nTo . Furthermore 



there does exist an encoding scheme such that P em2 =0 then P emi = e 
Proof: Let us a similar construction, 

Pt [vt\ V^ 1 ) = «7o(yt|^(mi,y*- 1 ),X t (m 2)? /*- 1 )). 

Recall that 



U (y t \x,x) = l^: l X\y \ x) W(y\x) 



Thus 



Then 



Pt [vt\ V 1 ' 1 ] < e To P[y t \ M = m u y 1 ' 1 ] 

Pt [ml y^ 1 ] < l{Pty t |M=m 2i y t - 1 ]>0} 

P[y n |M = mi] > e~ nT °P T [y n \ 
P[y n |M = m 2 ]>e nlnP -P r [y n ] 

where P m i n is the minimum non-zero element of W. Since P e b = equation d92l ) implies that Pt 
Using this fact and equation d9~TT ) we conclude that 

Pei>e- nT °. 



M 



m 2 



(91) 
(92) 

= 1. 
(93) 



Let us assume that maximizing x-pair is (x*,^) i.e. To = — In 2^ r VF(y|x 2 *)>o ^(^l 3 ! )• If the the encoding scheme 
sends x{ for the first message and x% for the second message, and the decoder decodes to second message unless 
Y t = y* for some some t € {1, 2, . . . , n} and y* such that W(y*\x%) = then P e „ l2 = and P emi : 



-nT 
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B. Convexity of E e (R, E x , a, P,U) in a: 

Lemma 9: For any probability distribution P on input alphabet X, ((P,Q,R) is convex in (Q,R) pair. 
Proof: Note that 

1 ((R a ,P,Q a ) + (l-7)((R b ,P,Q b )= min 7 D (V a \\ W\P) + (1 - 7 )D (V b \\ W\P) 

Va ' V "-(PV a ) Y =Q a (PV b ) Y =Q b 
Using the convexity of D (V\\ W\P) in V and Jensen's inequality we get, 

jaR a ,P,Q a ) + (l-j)C(R b ,P,Q b )> t min D(V 7 ||W|P) 

^-^•(Py ) v= Q a (py t ,)y=Q 6 

where F 7 = 7 F a + (1 - i)V b . 

If the set that a minimization is done over is enlarged, then the resulting minimum does not increase. Using this 
fact together with the convexity of I (P, V) in V and Jensen's inequality we get, 

j((R a ,P,Q a ) + (l-l)((R b ,P,Q b ) > min D(F 7 ||W|P) 

V ~'\PV-,) Y =Q-, 
= C(-&y> P Q7) 

where P 7 = 7 P a + (1 - 7 )P b , Q 7 = 7 Q a + (1 - 7 )Q 6 . ■ 
Lemma 10: For all (R, P X ,P, IT) quadruples such that E r (R,P) > P x , E e (R,E x , a, P,TV) is a convex function 

of a on the interval [a*(i?, P x , P), 1] where a*(P, P X ,P) is the unique solution^ of oP r (f , P) = E x . 

Proof: For any P such that E r (R,P) is non-negative, convex and decreasing function of R in the interval 

[0, 1 (P, W)\. Thus aE r (Q, P) is strictly increasing continuous function of a G [j^ppy, !]• Furthermore for a = ^j^ w y 

aP r (f , P) = and for a = 1, aE r (§ , P) = > P x . Thus aP r (§, P) = P x has a unique solution. 
Note that for any 7 G [0, 1] 

7 p e (p, p x , a a , p n) + (1 - 7 )P e (P, Px, a 6 , p n) 

7 kC(f*, P Qa) + Rla - R + (1 - «a)r II 
= min L r v 

Q "'£:&>fl ^if 26 '^ +( x " p M + Rib-R+(i- a b )v (i^,n 

^ ,PQa) + «2a ~R+T a <E x 

ot b £(^-,P,Q b )+R 2b -R+T b <E x 

> mm ay C(^,i',07) + «i7-«+(l-»r)r(i^,n 

a 7 C(— ,P,Q-,)+R2-,-R+T-,<E x 
= P e (P,P x ,a 7 ,P,n). 
where a 7 , T 7 , Q 7 , Pi 7 and P27 are given by, 

a 7 = 7 a a + (1 - 7 )a b T 7 = 7 T a + (1 - 7 )T fe Q 7 = ^Q a + 

Pl 7 = 7 P ia + (1 - 7 )Pl6 P27 = 7 P2a + (1 ~ 7 )P2fe 

The inequality follows from convexity arguments analogous to the ones used in the proof of Lemma [9] 



8 The equation aE T {—, P) = has multiple solutions; we choose the minimum of those to be the a* i.e., a*(R, 0, P) = 



R 



\(p,w) ■ 
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C. max n E e (R, E x ,a, P, II) > maxn E e (R, E X ,1,P, II), VP G P (P, P x , a) 
Let us first consider a control phase type IIp(xi, x 2 ) = "^ZTjTF^p 3 } an< ^ establish, 

P e (P, Px, a, P, Up) > E e (R, £ x , 1, P, n P ) VP G V (R, P x , a) (94) 



First consider 



i„„ U(y\xi,x2) _ 1 V^(y|£i) 

10 6 Vb(j/|n) 108 VK(y|xi) 



= i-E,(PM)' ^ P(x 1 )P(x 2 )J]^(y|x 1 ,x 2 ) 

> i- Ei (p W )» [' +D(%||iy|P)] (95) 

where the last step follows from the log sum inequality and transition probability matrices Vjj and Vjj are given by 

Vu(y\xi) = W{x x \y)P{x x ) + y\ U(y\x 1 ,x 2 )P(x 2 ) 
Vu(y\x 2 ) = W{x 2 \y)P(x 2 ) + Y] U(y\x u x 2 )P(x 1 ). 
Using a similar line of reasoning we get, 

(96) 



D(LT||W r |n P ) > r^-jy^ OlVu W\P)+\(P,Vu) 



Note that for all P G V (R,E x ,a) if use the inequalities (|95T ) and (l96l ) together the definition of P e given in ([T3T ) 
and ([H]) we get, 

P e (P,Px,a(P,P x ),P,n P ) > P e (P,P x ,l,P,n P ) + J P 

for some 5p > 0. Consequently for all P G P (P, P x ,a), equation d94l holds. 
Note that for all n and for all P G V (P, P x , a) 

p e (p,p x ,i,p,n P ) = p e (p,p x ,i,p,n). 

Thus we have: 

maxP e (P,P x ,a,P,n) > maxP e (P,P x ,l,P,n) VP G P (P, P x , a) . (97) 
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